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Introduction 

A common problem is that of separating different classes of events in a given 
sample. One may want to separate some "signal" from one or more "back- 
ground" sources, or simply distinguish between different classes of signal 
events. There are several instances where one cannot or does not wish to 
separate by means of cutting, and instead wants to do a statistical separa- 
tion. This means to be able to calculate the number of events in each category 
that are present in the given sample, and maybe measure some other char- 
acteristics of each class, without explicitly labeling each individual event as 
belonging to a particular category. For this to be possible, one needs some 
observables that have different distributions for each class of events. 

The purpose of this note is to define some criteria for quantifying the 
resolution achievable in statistical separation, given the distributions of the 
observables used to this purpose. One can use this to: 

• quote the separation power of an observable in a compact way 

• quickly evaluate the expected resolution on extracting the fractions of 
events in each category before actually performing any fit 

• decide the optimal variables to use in separation when there are several 
choices 
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Separating contributions 



Suppose your sample contains n different classes of events, each contribut- 
ing a fraction fi of the total, and let x be some observable (which may be 
multidimensional) that is supposed to distinguish between those events. The 
probability distribution of x for our sample will be: 

Ptot(x\f) = Yl fiPi(^) (1) 

i=l,n 

where Pi{x) is the pdf of x for events of type i, and it is assumed here to be 
perfectly known (any uncertainty in the Pi{x) would contribute a systematic 
uncertainty to the final results). 

The most basic informations one wishes to extract from the sample of 
data at hand is the values of the fractions ff, we can therefore take the 
resolution in extracting the /j's as the measure of the separating power of 
the observable x. 

The sum of all fi must be 1 in order for the overall distribution to be 
correctly normalized, so there are actually only n — 1 free parameters to be 
evaluated; let's put arbitrarily = 1 — Z]i=i,n-i fi- 

The resolution in estimating the /j's can in principle be measured by 
setting up a Maximum Likelihood fit procedure, and repeating it on a suffi- 
cient number of MonteCarlo samples to evaluate the spread of results around 
the input values. You can also look at the resolutions returned by your fa- 
vorite fitter program, but it is important to remember that those numbers 
are only approximate estimates of the actual resolution achieved, especially 
when statistics is low and/or the likelihood function is less than regular, so 
it is useful to be able to calculate them indipendently. This is also a good 
cross-check that the fit is actually doing what you want and that its error 
estimates are sound. 

A standard way to evaluate the resolution expected from a measurement 
before actually carrying it out is to look at the Minimum Variance Bound[l]: 

(2) 

this is an upper bound to the precision that can be achieved, whatever 
the estimation procedure used. Whenever the problem is sufficiently regular. 
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the ML estimator gets in fact very close to this hmit. 

Luckily enough, the MVB for our problem can be written down in a pretty 
simple form: the covariance matrix of the n — 1 independent fi parameter 
estimates is: 



cov{fi, fj) = — 



Ptot{x\f) 



1 -1 



dx 



(3) 



(remember that the fraction /„ associated to distribution Pn{x) is deter- 
mined from the other /j's). Note that in this formula the symbol x may 
stand for a set of many variables, discrete and/or continuous, and the inte- 
grals extend over the whole x domain. 

For a 2-component sample, there is only one fraction f — fi to be eval- 
uated, and the result is particulary simple: 



{pi{x) -P2{x)y 



-1 



dx 



(4) 



N \J fp,(x) + (l-f)p,{x) , 

This is the quantity you want to minimize in order to achieve the best 
possible statistical separation. 

In the limiting case of the different classes of events being totally sep- 
arated in that is, the Pi[x) having zero overlap, the uncertainties on 
come just from the statistical fluctuations of the distribution of the events 
amongst classes due to finite sample size, and eq. 4 becomes: 



'best 



(/) 



/(I - /) 



(5) 



which is the familiar result from the Binomial distribution. 

It is particularly convenient to use the ratio of the resolution (4) to the 
limit resolution (5), in order to quote the separation power of the observable 
X as an adimensional quantity: 



S = (Thestif) / (7{f) = 



/(I-/) 



{pi{x) -P2{x)y 

Ptot{x\f) 



dx 



(6) 



This is indipendent from the sample size A^, and tells you at a glance the 
power of the x observable in separating the samples, from (no separation) 
to 1 (absolute maximum achievable with the given sample). This quantity 
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is more informative than common expressions like "n-sigma separation" or 
"curves overlap by xxx%" , as it tells you exactly how good the observable 
X is in separating the events, and it is valid whatever the shape and the 
dimensionality of the distributions involved. 

Examples 

A simple and common example is the separation between two 1-dimensional 
gaussian distributions of same sigma. The above quantity s is easily evaluated 
by numerical integration. Note that s, as it generally happens for resolutions, 
depends on the true value of the fractions Figure 1 shows s as a function of 
the distance, in units of sigma, between the mean values of the two gaussians, 
and the different curves are for different values of /. From this graph you 
can read, for instance, that a separation of 1 sigma between roughly equally 
populated samples gives you a resolution on the relative fractions slightly 
more than a factor of two (1/0.45) worse than ideal, that is to say, the 
sample is statistically equivalent to a fully separated sample of size smaller 
by a factor 0.2 = 0.45^. 
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Figure 1: Separation power between two gaussians, as a function of their 
distance 
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